A Bootstrapping Approach to Identifying Relevant Tweets for Social TV

نویسندگان

  • Ovidiu Dan
  • Junlan Feng
  • Brian D. Davison
چکیده

Manufacturers of TV sets have recently started adding social media features to their products. Some of these products display microblogging messages relevant to the TV show which the user is currently watching. However, such systems suffer from low precision and recall when they use the title of the show to search for relevant messages. Titles of some popular shows such as Lost or Survivor are highly ambiguous, resulting in messages unrelated to the show. Thus, there is a need to develop filtering algorithms that can achieve both high precision and recall. Filtering microblogging messages for Social TV poses several challenges, including lack of training data, lack of proper grammar and capitalization, lack of context due to text sparsity, etc. We describe a bootstrapping algorithm which uses a small manually labeled dataset, a large dataset of unlabeled messages, and some domain knowledge to derive a high precision classifier that can successfully filter microblogging messages which discuss television shows. The classifier is designed to generalize to TV shows which were not part of the training set. The algorithm achieves high precision on our two test datasets and successfully generalizes to unseen television shows. Furthermore, it compares favorably to a text classifier specifically trained on the television shows used for testing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Modeling for Voice-Enabled Social TV Using Tweets

Social TV is a recent trend that integrates social media access and TV viewing. In this paper, we investigate approaches for building effective language models for a voice-enabled social TV application, where viewers can speak their social media updates while watching TV. We propose to take advantage of social media data, more specifically TV-related Twitter messages (tweets). The challenge is ...

متن کامل

Mining User Intents in Twitter: A Semi-Supervised Approach to Inferring Intent Categories for Tweets

In this paper, we propose to study the problem of identifying and classifying tweets into intent categories. For example, a tweet “I wanna buy a new car” indicates the user’s intent for buying a car. Identifying such intent tweets will have great commercial value among others. In particular, it is important that we can distinguish different types of intent tweets. We propose to classify intent ...

متن کامل

Marketing Ecosystem: The Dynamics of Twitter, TV Advertising, and Customer Acquisition

Social media, especially the micro-blogging network Twitter, have gained much popularity among users and have thus attracted attention from firms. Social media can serve as advertising media and platforms of online wordof-mouth, because they enable consumers to share their consumption experiences with others easily. When firms advertise their products and services, they usually do not rely on o...

متن کامل

Identifying Potential Adverse Drug Events in Tweets Using Bootstrapped Lexicons

Adverse drug events (ADEs) are medical complications co-occurring with a period of drug usage. Identification of ADEs is a primary way of evaluating available quality of care. As more social media users begin discussing their drug experiences online, public data becomes available for researchers to expand existing electronic ADE reporting systems, though non-standard language inhibits ease of a...

متن کامل

A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks

The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011